On differentially private frequent itemset mining

نویسندگان

  • Chen Zeng
  • Jeffrey F. Naughton
  • Jin-Yi Cai
چکیده

We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly, we investigate an approach that begins by truncating long transactions, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy. Experimental results over standard benchmark databases show that truncating is indeed effective. Our algorithm solves the "classical" frequent itemset mining problem, in which the goal is to find all itemsets whose support exceeds a threshold. Related work has proposed differentially private algorithms for the top-k itemset mining problem ("find the k most frequent itemsets".) An experimental comparison with those algorithms show that our algorithm achieves better F-score unless k is small.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Differentially Private Frequent Itemset Mining

Frequent sets play an important role in many Data Mining tasks that try to search interesting patterns from databases, such as association rules, sequences, correlations, episodes, classifiers and clusters. FrequentItemsets Mining (FIM) is the most well-known techniques to extract knowledge from dataset. In this paper differential privacy aims to get means to increase the accuracy of queries fr...

متن کامل

Privacy Preserving Private Frequent Itemset Mining via Smart Splitting

Recently there has been a growing interest in designing differentially private data mining algorithms. A variety of algorithms have been proposed for mining frequent itemsets. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. It has practical importance in a wide range of application areas such as decision support, web usage mining, bioinformatics, etc. In th...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

CS 730R: Topics in Data and Information Management

1. Summary. In this paper the authors propose a differentially privacy preserving algorithm for mining frequent itemset. This work differs from the other privacy preserving miners present in literature, indeed this algorithm mines the itemset by enforcing cardinality constraints on the transactions present in the dataset. In particular the authors study how the reduction the cardinality of the ...

متن کامل

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Data Mining and knowledge discovery is one of the important areas. In this paper we are presenting a survey on various methods for frequent pattern mining. From the past decade, frequent pattern mining plays a very important role but it does not consider the weight factor or value of the items. The very first and basic technique to find the correlation of data is Association Rule Mining. In ARM...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • The VLDB journal : very large data bases : a publication of the VLDB Endowment

دوره 6 1  شماره 

صفحات  -

تاریخ انتشار 2012